Load the necessary libraries
library(rstanarm) # for fitting models in STAN
library(brms) # for fitting models in STAN
library(coda) # for diagnostics
library(bayesplot) # for diagnostics
library(ggmcmc) # for MCMC diagnostics
library(rstan) # for interfacing with STAN
library(emmeans) # for marginal means etc
library(broom) # for tidying outputs
library(DHARMa) # for residual diagnostics
library(tidybayes) # for more tidying outputs
library(ggeffects) # for partial plots
library(broom.mixed) # for tidying MCMC outputs
library(tidyverse) # for data wrangling etc
library(patchwork) # for multiple plots
A plant pathologist wanted to examine the effects of two different strengths of tobacco virus on the number of lesions on tobacco leaves. She knew from pilot studies that leaves were inherently very variable in response to the virus. In an attempt to account for this leaf to leaf variability, both treatments were applied to each leaf. Eight individual leaves were divided in half, with half of each leaf inoculated with weak strength virus and the other half inoculated with strong virus. So the leaves were blocks and each treatment was represented once in each block. A completely randomised design would have had 16 leaves, with 8 whole leaves randomly allocated to each treatment.
Tobacco plant
Format of tobacco.csv data files
| LEAF | TREAT | NUMBER |
|---|---|---|
| 1 | Strong | 35.898 |
| 1 | Week | 25.02 |
| 2 | Strong | 34.118 |
| 2 | Week | 23.167 |
| 3 | Strong | 35.702 |
| 3 | Week | 24.122 |
| ... | ... | ... |
| LEAF | The blocking factor - Factor B |
| TREAT | Categorical representation of the strength of the tobacco virus - main factor of interest Factor A |
| NUMBER | Number of lesions on that part of the tobacco leaf - response variable |
tobacco <- read_csv("../public/data/tobacco.csv", trim_ws = TRUE)
tobacco %>% glimpse()
## Rows: 16
## Columns: 3
## $ LEAF <chr> "L1", "L1", "L2", "L2", "L3", "L3", "L4", "L4", "L5", "L5", …
## $ TREATMENT <chr> "Strong", "Weak", "Strong", "Weak", "Strong", "Weak", "Stron…
## $ NUMBER <dbl> 35.89776, 25.01984, 34.11786, 23.16740, 35.70215, 24.12191, …
Along with ensuring that all categorical variables are specifically declared as factors, if we intend to use either rstanarm or brms, we also need to ensure that all varying effects are also declared as factors.
tobacco <- tobacco %>% mutate(
LEAF = factor(LEAF),
TREATMENT = factor(TREATMENT)
)
tobacco %>% head()
To explore the assumptions of homogeneity of variance and normality, a boxplot of each Treatment level is appropriate.
ggplot(tobacco, aes(y = NUMBER, x = TREATMENT)) +
geom_boxplot()
Conclusions:
It can also be useful to get a sense of the consistency across blocks (LEAF). That is, do all Leaves have a similar baseline level of lesion susceptibility and do they respond similarly to the treatment.
ggplot(tobacco, aes(y = NUMBER, x = as.numeric(LEAF))) +
geom_line(aes(linetype = TREATMENT))
## If we want to retain the original LEAF labels
ggplot(tobacco, aes(y = NUMBER, x = as.numeric(LEAF))) +
geom_blank(aes(x = LEAF)) +
geom_line(aes(linetype = TREATMENT))
Conclusions:
Given that there are only two levels of Treatment (Strong and Weak), it might be easier to visualise the differences in baselines and effect consistency by plotting as:
ggplot(tobacco, aes(y = NUMBER, x = TREATMENT, group = LEAF)) +
geom_point() +
geom_line(aes(x = as.numeric(TREATMENT)))
Conclusions:
The above figure also serves as a good way to visualise certain aspects of mixed effects models. When we fit a mixed effects model that includes a random blocking effect (in this case LEAF), we are indicating that we are allowing there to be a different intercept for each block (LEAF). In the current case, the intercept will represent the first Treatment level (Strong). So the random effect is specifying that the intercept can vary from Leaf to Leaf.
We can think of the model as having two tiers (a hierarchy), where the tiers of the hierarchy represent progressively smaller (typically) spatial scales. In the current example, the largest spatial units are the leaves (blocking factor). Within the leaves, there are the two Treatments (Strong and Weak) and within the Treatments are the individual observations.
From a frequentist perspective, this model might be referred to as a mixed effects model in which the TREATMENT is a ‘fixed’ effect and the LEAF is a ‘random’ effect. Such terminology is inconsistent in a Bayesian context since all parameters are ‘random’. Instead, these would be referred to as population-level and group-level effects respectively. Group-level effects are so called because they are effects that differ within each group (e.g. level of a blocking factor), whereas population effects are those that have been pooled across all groups.
Model formula: \[ y_i \sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i =\bf{X_i}\boldsymbol{\beta} + \bf{Z_i}\boldsymbol{\gamma} \\ \beta_0 \sim{} \mathcal{N}(35, 20)\\ \beta_1 \sim{} \mathcal{N}(0, 10)\\ \boldsymbol{\gamma} \sim{} \mathcal{N}(0, \boldsymbol{\Sigma})\\ \boldsymbol{\Sigma} = \boldsymbol{D}({\sigma_l})\boldsymbol{\Omega}\boldsymbol{D}({\sigma_l})\\ \boldsymbol{\Omega} \sim{} LKJ(\zeta)\\ \sigma_j^2 \sim{} \\mathcal{Cauchy}(0,5)\ \sigma^2 \sim{} Gamma(2,1)\ \]
where:
It turns out that it is difficult to apply a prior on a covariance matrix, so instead, we decompose the covariance matrix into a correlation matrix and variance.
https://jrnold.github.io/bayesian_notes/appendix.html - 20.15.3 Covariance-Correlation Matrix Decomposition
Covariance matrix can be decomposed into a correlation matrix and a vector of variances
The variances can be further decomposed into the product of a simplex vector (which is a probability vector, non-negative and sums to 1) and the trace (product of the order of the matrix and the scale of the scale parameter, also the sum of its diagonal elements) of a matrix. Each element of the simplex vector represents the proportion of the trace that is attributable to the corresponding variable.
A prior on all the above is a decov (decomposition of covariance) function
The prior on the correlation matrix is called LKJ
density is proportional to the determinant of the correlation matrix raised to the power of the positive regularization paramter minus one.
The prior on the simplex vector is a symmetric Dirichlet prior which has a single (positive) concentration parameter (default of 1 implying the prior is jointly uniform over the same of simplex vectors of that size) A symmetric Dirichlet prior is used for the simplex vector. The Dirichlet prior has a single (positive) concentration parameter
The positive scale paramter has a gamma prior (with default shape and scale of 1 - implying a unit-exponential distribution)
alternatively, the lkj prior can be used for covariance.
as with decov, it decomposes into correlation and variances, however the variances are not further decomosed into a simplex vector and trace.
instead the standard deviations (variance squared) for each of the group specific paramters are given half student-t distribution with scale and df paramters specified through the scale (default 10) and df (default 1) arguments of the lkj function.
the lkj prior is similar, yet faster than decov
In rstanarm, the default priors are designed to be weakly informative. They are chosen to provide moderate regularlization (to help prevent overfitting) and help stabalise the computations.
tobacco.rstanarm <- stan_glmer(NUMBER ~ (1 | LEAF) + TREATMENT,
data = tobacco,
family = gaussian(),
iter = 5000,
warmup = 2000,
chains = 3,
thin = 5,
refresh = 0
)
tobacco.rstanarm %>% prior_summary()
## Priors for model '.'
## ------
## Intercept (after predictors centered)
## Specified prior:
## ~ normal(location = 31, scale = 2.5)
## Adjusted prior:
## ~ normal(location = 31, scale = 16)
##
## Coefficients
## Specified prior:
## ~ normal(location = 0, scale = 2.5)
## Adjusted prior:
## ~ normal(location = 0, scale = 32)
##
## Auxiliary (sigma)
## Specified prior:
## ~ exponential(rate = 1)
## Adjusted prior:
## ~ exponential(rate = 0.15)
##
## Covariance
## ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
## ------
## See help('prior_summary.stanreg') for more details
This tells us:
2.5 * sd(tobacco$NUMBER)
## [1] 16.35115
2.5 * sd(tobacco$NUMBER) / sd(model.matrix(~TREATMENT, tobacco)[, 2])
## [1] 31.66386
rstanarm, this is a exponential with a rate of 1 which is then adjusted by devision with the standard deviation of the response.1 / sd(tobacco$NUMBER)
## [1] 0.1528945
Lets now run with priors only so that we can explore the range of values they allow in the posteriors.
tobacco.rstanarm1 <- update(tobacco.rstanarm, prior_PD = TRUE)
ggpredict(tobacco.rstanarm1) %>% plot(add.data = TRUE)
## $TREATMENT
Conclusions:
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucket out of thin air):
I will also overlay the raw data for comparison.
tobacco.rstanarm2 <- stan_glmer(NUMBER ~ (1 | LEAF) + TREATMENT,
data = tobacco,
family = gaussian(),
prior_intercept = normal(35, 10, autoscale = FALSE),
prior = normal(0, 10, autoscale = FALSE),
prior_aux = rstanarm::exponential(0.1, autoscale = FALSE),
prior_covariance = decov(1, 1, 1, 1),
prior_PD = TRUE,
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
refresh = 0
)
tobacco.rstanarm2 %>%
ggpredict() %>%
plot(add.data = TRUE)
## $TREATMENT
In brms, the default priors are designed to be weakly informative. They are chosen to provide moderate regularlization (to help prevent overfitting) and help stabalise the computations.
Unlike rstanarm, brms models must be compiled before they start sampling. For most models, the compilation of the stan code takes around 45 seconds.
tobacco.form <- bf(NUMBER ~ (1|LEAF) + TREATMENT,
family = gaussian()
)
options(width=100)
tobacco.form %>% get_prior(data=tobacco)
## prior class coef group resp dpar nlpar bound source
## (flat) b default
## (flat) b TREATMENTWeak (vectorized)
## student_t(3, 33.1, 6.5) Intercept default
## student_t(3, 0, 6.5) sd default
## student_t(3, 0, 6.5) sd LEAF (vectorized)
## student_t(3, 0, 6.5) sd Intercept LEAF (vectorized)
## student_t(3, 0, 6.5) sigma default
## tobacco.brm <- brm(tobacco.form,
## iter = 5000,
## warmup = 1000,
## chains = 3,
## thin = 5,
## refresh = 0)
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucket out of thin air):
Note, for hierarchical models, the model will tend to want to have a large \(sigma\) in order to fit the data better. It is a good idea to regularise this tendency by applying a prior that has most mass around zero. Suitable candidates include:
I will also overlay the raw data for comparison.
tobacco %>%
group_by(TREATMENT) %>%
summarise(
median(NUMBER),
mad(NUMBER)
)
standist::visualize("normal(35,5)", xlim = c(-10, 100))
standist::visualize("student_t(3, 0, 6.5)", xlim = c(-10, 100))
standist::visualize("student_t(3, 0, 6.5)",
"gamma(2,1)",
"gamma(2,0.5)",
"gamma(5,0.1)",
"exponential(1)",
"cauchy(0,2)",
xlim = c(-10, 25)
)
priors <- prior(normal(35, 20), class = "Intercept") +
prior(normal(0, 10), class = "b") +
prior(gamma(2, 0.5), class = "sigma") +
prior(cauchy(0, 2), class = "sd")
tobacco.form <- bf(NUMBER ~ (1 | LEAF) + TREATMENT,
family = gaussian()
)
tobacco.brm2 <- brm(tobacco.form,
data = tobacco,
prior = priors,
sample_prior = "yes",
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
refresh = 0
)
tobacco.form <- bf(NUMBER ~ (TREATMENT | LEAF) + TREATMENT,
family = gaussian()
)
tobacco.brm3 <- brm(tobacco.form,
data = tobacco,
prior = priors,
sample_prior = "yes",
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
refresh = 0,
control = list(adapt_delta = 0.99)
)
(l.1 <- tobacco.brm2 %>% loo())
##
## Computed from 2400 by 16 log-likelihood matrix
##
## Estimate SE
## elpd_loo -50.6 2.7
## p_loo 6.0 1.4
## looic 101.2 5.4
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 4 25.0% 716
## (0.5, 0.7] (ok) 7 43.8% 226
## (0.7, 1] (bad) 5 31.2% 55
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
(l.2 <- tobacco.brm3 %>% loo())
##
## Computed from 2400 by 16 log-likelihood matrix
##
## Estimate SE
## elpd_loo -50.5 2.9
## p_loo 7.7 1.8
## looic 101.1 5.8
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 3 18.8% 296
## (0.5, 0.7] (ok) 10 62.5% 154
## (0.7, 1] (bad) 3 18.8% 18
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
loo_compare(l.1, l.2)
## elpd_diff se_diff
## . 0.0 0.0
## . -0.1 0.3
tobacco.brm3 %>% get_variables()
## [1] "b_Intercept" "b_TREATMENTWeak"
## [3] "sd_LEAF__Intercept" "sd_LEAF__TREATMENTWeak"
## [5] "cor_LEAF__Intercept__TREATMENTWeak" "sigma"
## [7] "r_LEAF[L1,Intercept]" "r_LEAF[L2,Intercept]"
## [9] "r_LEAF[L3,Intercept]" "r_LEAF[L4,Intercept]"
## [11] "r_LEAF[L5,Intercept]" "r_LEAF[L6,Intercept]"
## [13] "r_LEAF[L7,Intercept]" "r_LEAF[L8,Intercept]"
## [15] "r_LEAF[L1,TREATMENTWeak]" "r_LEAF[L2,TREATMENTWeak]"
## [17] "r_LEAF[L3,TREATMENTWeak]" "r_LEAF[L4,TREATMENTWeak]"
## [19] "r_LEAF[L5,TREATMENTWeak]" "r_LEAF[L6,TREATMENTWeak]"
## [21] "r_LEAF[L7,TREATMENTWeak]" "r_LEAF[L8,TREATMENTWeak]"
## [23] "prior_Intercept" "prior_b"
## [25] "prior_sigma" "prior_sd_LEAF"
## [27] "prior_cor_LEAF" "lp__"
## [29] "accept_stat__" "stepsize__"
## [31] "treedepth__" "n_leapfrog__"
## [33] "divergent__" "energy__"
tobacco.brm3 %>%
hypothesis("TREATMENTWeak=0") %>%
plot()
tobacco.brm3 %>%
posterior_samples() %>%
dplyr::select(-`lp__`) %>%
pivot_longer(everything(), names_to = "key") %>%
filter(!str_detect(key, "^r")) %>%
mutate(
Type = ifelse(str_detect(key, "prior"), "Prior", "Posterior"),
## Class = ifelse(str_detect(key, 'Intercept'), 'Intercept',
## ifelse(str_detect(key, 'b'), 'b', 'sigma')),
Class = case_when(
str_detect(key, "(^b|^prior).*Intercept$") ~ "Intercept",
str_detect(key, "b_TREATMENT|prior_b") ~ "TREATMENT",
str_detect(key, "sd") ~ "sd",
str_detect(key, "^cor|prior_cor") ~ "cor",
str_detect(key, "sigma") ~ "sigma"
),
Par = str_replace(key, "b_", "")
) %>%
ggplot(aes(x = Type, y = value, color = Par)) +
stat_pointinterval(position = position_dodge()) +
facet_wrap(~Class, scales = "free")
In addition to the regular model diagnostics checking, for Bayesian analyses, it is also necessary to explore the MCMC sampling diagnostics to be sure that the chains are well mixed and have converged on a stable posterior.
There are a wide variety of tests that range from the big picture, overal chain characteristics to the very specific detailed tests that allow the experienced modeller to drill down to the very fine details of the chain behaviour. Furthermore, there are a multitude of packages and approaches for exploring these diagnostics.
The bayesplot package offers a range of MCMC diagnostics as well as Posterior Probability Checks (PPC), all of which have a convenient plot() interface. Lets start with the MCMC diagnostics.
available_mcmc()
## bayesplot MCMC module:
## mcmc_acf
## mcmc_acf_bar
## mcmc_areas
## mcmc_areas_data
## mcmc_areas_ridges
## mcmc_areas_ridges_data
## mcmc_combo
## mcmc_dens
## mcmc_dens_chains
## mcmc_dens_chains_data
## mcmc_dens_overlay
## mcmc_hex
## mcmc_hist
## mcmc_hist_by_chain
## mcmc_intervals
## mcmc_intervals_data
## mcmc_neff
## mcmc_neff_data
## mcmc_neff_hist
## mcmc_nuts_acceptance
## mcmc_nuts_divergence
## mcmc_nuts_energy
## mcmc_nuts_stepsize
## mcmc_nuts_treedepth
## mcmc_pairs
## mcmc_parcoord
## mcmc_parcoord_data
## mcmc_rank_hist
## mcmc_rank_overlay
## mcmc_recover_hist
## mcmc_recover_intervals
## mcmc_recover_scatter
## mcmc_rhat
## mcmc_rhat_data
## mcmc_rhat_hist
## mcmc_scatter
## mcmc_trace
## mcmc_trace_data
## mcmc_trace_highlight
## mcmc_violin
Of these, we will focus on:
tobacco.brm3 %>% mcmc_plot(type = "trace")
The chains appear well mixed and very similar
tobacco.brm3 %>% mcmc_plot(type = "acf_bar")
There is no evidence of autocorrelation in the MCMC samples
tobacco.brm3 %>% mcmc_plot(type = "rhat_hist")
All Rhat values are below 1.05, suggesting the chains have converged.
neff_hist (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
tobacco.brm2 %>% mcmc_plot(type = "neff_hist")
Ratios all very high.
tobacco.brm3 %>% mcmc_plot(type = "combo")
tobacco.brm3 %>% mcmc_plot(type = "violin")
The rstan package offers a range of MCMC diagnostics. Lets start with the MCMC diagnostics.
Of these, we will focus on:
tobacco.brm3 %>% get_variables()
## [1] "b_Intercept" "b_TREATMENTWeak"
## [3] "sd_LEAF__Intercept" "sd_LEAF__TREATMENTWeak"
## [5] "cor_LEAF__Intercept__TREATMENTWeak" "sigma"
## [7] "r_LEAF[L1,Intercept]" "r_LEAF[L2,Intercept]"
## [9] "r_LEAF[L3,Intercept]" "r_LEAF[L4,Intercept]"
## [11] "r_LEAF[L5,Intercept]" "r_LEAF[L6,Intercept]"
## [13] "r_LEAF[L7,Intercept]" "r_LEAF[L8,Intercept]"
## [15] "r_LEAF[L1,TREATMENTWeak]" "r_LEAF[L2,TREATMENTWeak]"
## [17] "r_LEAF[L3,TREATMENTWeak]" "r_LEAF[L4,TREATMENTWeak]"
## [19] "r_LEAF[L5,TREATMENTWeak]" "r_LEAF[L6,TREATMENTWeak]"
## [21] "r_LEAF[L7,TREATMENTWeak]" "r_LEAF[L8,TREATMENTWeak]"
## [23] "prior_Intercept" "prior_b"
## [25] "prior_sigma" "prior_sd_LEAF"
## [27] "prior_cor_LEAF" "lp__"
## [29] "accept_stat__" "stepsize__"
## [31] "treedepth__" "n_leapfrog__"
## [33] "divergent__" "energy__"
pars <- tobacco.brm3 %>% get_variables()
pars <- str_extract(pars, "^b_.*|^sigma$|^sd.*") %>% na.omit()
tobacco.brm3$fit %>%
stan_trace(pars = pars)
The chains appear well mixed and very similar
tobacco.brm3$fit %>%
stan_ac(pars = pars)
There is no evidence of autocorrelation in the MCMC samples
tobacco.brm3$fit %>% stan_rhat()
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
tobacco.brm2$fit %>% stan_ess()
Ratios all very highj.
tobacco.brm3$fit %>%
stan_dens(separate_chains = TRUE, pars = pars)
The ggmean package also as a set of MCMC diagnostic functions. Lets start with the MCMC diagnostics.
Of these, we will focus on:
tobacco.ggs <- tobacco.brm3 %>% ggs(burnin = FALSE, inc_warmup = FALSE)
tobacco.ggs %>% ggs_traceplot()
The chains appear well mixed and very similar
ggs_autocorrelation(tobacco.ggs)
There is no evidence of autocorrelation in the MCMC samples
ggs_Rhat(tobacco.ggs)
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
ggs_effective(tobacco.ggs)
Ratios all very high.
ggs_crosscorrelation(tobacco.ggs)
ggs_grb(tobacco.ggs)
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
available_ppc()
## bayesplot PPC module:
## ppc_bars
## ppc_bars_grouped
## ppc_boxplot
## ppc_data
## ppc_dens
## ppc_dens_overlay
## ppc_dens_overlay_grouped
## ppc_ecdf_overlay
## ppc_ecdf_overlay_grouped
## ppc_error_binned
## ppc_error_hist
## ppc_error_hist_grouped
## ppc_error_scatter
## ppc_error_scatter_avg
## ppc_error_scatter_avg_vs_x
## ppc_freqpoly
## ppc_freqpoly_grouped
## ppc_hist
## ppc_intervals
## ppc_intervals_data
## ppc_intervals_grouped
## ppc_km_overlay
## ppc_loo_intervals
## ppc_loo_pit
## ppc_loo_pit_data
## ppc_loo_pit_overlay
## ppc_loo_pit_qq
## ppc_loo_ribbon
## ppc_ribbon
## ppc_ribbon_data
## ppc_ribbon_grouped
## ppc_rootogram
## ppc_scatter
## ppc_scatter_avg
## ppc_scatter_avg_grouped
## ppc_stat
## ppc_stat_2d
## ppc_stat_freqpoly_grouped
## ppc_stat_grouped
## ppc_violin_grouped
tobacco.brm3 %>% pp_check(type = "dens_overlay", nsamples = 100)
The model draws appear to be consistent with the observed data.
tobacco.brm3 %>% pp_check(type = "error_scatter_avg")
This is not really interpretable
tobacco.brm3 %>% pp_check(group = "TREATMENT", type = "intervals")
The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.
# library(shinystan)
# launch_shinystan(tobacco.brm2)
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components ourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
preds <- tobacco.brm3 %>% posterior_predict(nsamples = 250, summary = FALSE)
tobacco.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = tobacco$NUMBER,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = FALSE
)
plot(tobacco.resids, quantreg = FALSE)
Conclusions:
tobacco.brm3 %>%
conditional_effects() %>%
plot(points = TRUE)
tobacco.brm3 %>%
ggpredict() %>%
plot(add.data = TRUE)
## $TREATMENT
tobacco.brm3 %>%
ggemmeans(~TREATMENT) %>%
plot(add.data = TRUE) +
geom_point(data = tobacco, aes(y = NUMBER, x = as.numeric(TREATMENT)))
Partial.obs <- tobacco.brm3$data %>%
mutate(
Pred = predict(tobacco.brm3)[, "Estimate"],
Resid = resid(tobacco.brm3)[, "Estimate"],
Obs = Pred + Resid
)
tobacco.brm3 %>%
fitted_draws(newdata = tobacco) %>%
median_hdci() %>%
ggplot(aes(x = TREATMENT, y = .value)) +
geom_pointrange(aes(ymin = .lower, ymax = .upper)) +
geom_line() +
geom_point(
data = Partial.obs, aes(y = Obs, x = TREATMENT), color = "red",
position = position_nudge(x = 0.1)
) +
geom_point(
data = tobacco, aes(y = NUMBER, x = TREATMENT),
position = position_nudge(x = 0.05)
)
brms captures the MCMC samples from stan within the returned list. There are numerous ways to retrieve and summarise these samples. The first three provide convenient numeric summaries from which you can draw conclusions, the last four provide ways of obtaining the full posteriors.
The summary() method generates simple summaries (mean, standard deviation as well as 10, 50 and 90 percentiles).
tobacco.brm3 %>% summary()
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: NUMBER ~ (TREATMENT | LEAF) + TREATMENT
## Data: tobacco (Number of observations: 16)
## Draws: 3 chains, each with iter = 1000; warmup = 200; thin = 5;
## total post-warmup draws = 480
##
## Group-Level Effects:
## ~LEAF (Number of levels: 8)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept) 2.63 1.82 0.12 6.62 1.00 1539
## sd(TREATMENTWeak) 2.04 1.89 0.06 6.75 1.00 1603
## cor(Intercept,TREATMENTWeak) 0.02 0.56 -0.94 0.95 1.00 2025
## Tail_ESS
## sd(Intercept) 1838
## sd(TREATMENTWeak) 1413
## cor(Intercept,TREATMENTWeak) 2122
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 34.75 1.99 30.74 38.59 1.00 2288 2102
## TREATMENTWeak -7.41 2.45 -11.91 -2.32 1.00 2228 1718
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 4.38 1.26 1.98 7.09 1.00 1223 897
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
tobacco.brm3$fit %>%
tidyMCMC(
estimate.method = "median",
conf.int = TRUE, conf.method = "HPDinterval",
rhat = TRUE, ess = TRUE
)
Conclusions:
see above
Due to the presence of a log transform in the predictor, it is better to use the regex version.
tobacco.brm3 %>% get_variables()
## [1] "b_Intercept" "b_TREATMENTWeak"
## [3] "sd_LEAF__Intercept" "sd_LEAF__TREATMENTWeak"
## [5] "cor_LEAF__Intercept__TREATMENTWeak" "sigma"
## [7] "r_LEAF[L1,Intercept]" "r_LEAF[L2,Intercept]"
## [9] "r_LEAF[L3,Intercept]" "r_LEAF[L4,Intercept]"
## [11] "r_LEAF[L5,Intercept]" "r_LEAF[L6,Intercept]"
## [13] "r_LEAF[L7,Intercept]" "r_LEAF[L8,Intercept]"
## [15] "r_LEAF[L1,TREATMENTWeak]" "r_LEAF[L2,TREATMENTWeak]"
## [17] "r_LEAF[L3,TREATMENTWeak]" "r_LEAF[L4,TREATMENTWeak]"
## [19] "r_LEAF[L5,TREATMENTWeak]" "r_LEAF[L6,TREATMENTWeak]"
## [21] "r_LEAF[L7,TREATMENTWeak]" "r_LEAF[L8,TREATMENTWeak]"
## [23] "prior_Intercept" "prior_b"
## [25] "prior_sigma" "prior_sd_LEAF"
## [27] "prior_cor_LEAF" "lp__"
## [29] "accept_stat__" "stepsize__"
## [31] "treedepth__" "n_leapfrog__"
## [33] "divergent__" "energy__"
tobacco.draw <- tobacco.brm3 %>%
gather_draws(`b.Intercept.*|b_TREAT.*`, regex = TRUE)
tobacco.draw
We can then summarise this
tobacco.draw %>% median_hdci()
tobacco.brm3 %>%
gather_draws(`b_Intercept.*|b_TREAT.*`, regex = TRUE) %>%
ggplot() +
geom_vline(xintercept = 0, linetype = "dashed") +
stat_slab(aes(
x = .value, y = .variable,
fill = stat(ggdist::cut_cdf_qi(cdf,
.width = c(0.5, 0.8, 0.95),
labels = scales::percent_format()
))
), color = "black") +
scale_fill_brewer("Interval", direction = -1, na.translate = FALSE)
tobacco.brm3 %>%
gather_draws(`.Intercept.*|.*TREAT.*`, regex = TRUE) %>%
ggplot() +
stat_halfeye(aes(x = .value, y = .variable)) +
facet_wrap(~.variable, scales = "free")
tobacco.brm3 %>%
gather_draws(`.Intercept.*|.*TREAT.*`, regex = TRUE) %>%
ggplot() +
stat_halfeye(aes(x = .value, y = .variable)) +
theme_classic()
We could alternatively express the parameters on the response scale.
tobacco.brm3 %>%
gather_draws(`b.Intercept.*|.*TREAT.*`, regex = TRUE) %>%
group_by(.variable) %>%
median_hdci()
tobacco.brm3 %>%
gather_draws(`b.Intercept.*|.*TREAT.*`, regex = TRUE) %>%
ggplot() +
geom_vline(xintercept = 1, linetype = "dashed") +
stat_slab(aes(
x = .value, y = .variable,
fill = stat(ggdist::cut_cdf_qi(cdf,
.width = c(0.5, 0.8, 0.95),
labels = scales::percent_format()
))
), color = "black") +
scale_fill_brewer("Interval", direction = -1, na.translate = FALSE)
Conclusions:
tobacco.brm3$fit %>% plot(type = "intervals")
This is purely a graphical depiction on the posteriors.
tobacco.brm3 %>% tidy_draws()
tobacco.brm3 %>% spread_draws(`.*Intercept.*|.*TREAT.*`, regex = TRUE)
tobacco.brm3 %>%
posterior_samples() %>%
as_tibble()
tobacco.brm3 %>%
bayes_R2(re.form = NA, summary = FALSE) %>%
median_hdci()
tobacco.brm3 %>%
bayes_R2(re.form = ~ (1 | LEAF), summary = FALSE) %>%
median_hdci()
tobacco.brm3 %>%
bayes_R2(re.form = ~ (TREATMENT | LEAF), summary = FALSE) %>%
median_hdci()
newdata <- tobacco.brm3 %>%
emmeans(~TREATMENT) %>%
gather_emmeans_draws() %>%
pivot_wider(names_from = TREATMENT, values_from = .value) %>%
mutate(
Eff = Strong - Weak,
PEff = 100 * Eff / Weak
)
newdata %>% median_hdci(PEff)
newdata %>% summarise(P = sum(PEff > 0) / n())
newdata %>% summarise(P = sum(PEff > 20) / n())
newdata %>%
dplyr::select(-.chain, -.iteration) %>%
hypothesis("PEff>20")
## Hypothesis Tests for class :
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (PEff)-(20) > 0 7.71 10.67 -9.14 25.1 3.71 0.79
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
newdata <- tobacco.brm3 %>%
emmeans(~TREATMENT) %>%
as.data.frame()
head(newdata)
ggplot(newdata, aes(y = emmean, x = TREATMENT)) +
geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD))
# OR
newdat <- tobacco %>% tidyr::expand(TREATMENT)
newdata <- tobacco.brm3 %>%
brms::posterior_epred(newdat, re_formula = NA) %>%
as.data.frame() %>%
rename_with(~ as.character(newdat$TREATMENT)) %>%
mutate(
Eff = Strong - Weak,
PEff = 100 * Eff / Weak
)
head(newdata)